An Alternate GPU-Accelerated Algorithm for Very Large Sparse LU Factorization

نویسندگان

چکیده

The LU factorization of very large sparse matrices requires a significant amount computing resources, including memory and broadband communication. A hybrid MPI + OpenMP CUDA algorithm named SuperLU3D can efficiently compute the with GPU acceleration. However, this faces difficulties when dealing limited resources. Factorizing involves vast nonblocking communication between processes, often leading to break in calculation due overflow cluster buffers. In paper, we present an improved GPU-accelerated SuperLU3D_Alternate for fewer basic idea is “divide conquer”, which means dividing matrix into multiple submatrices, performing on each submatrix, then assembling factorized results all submatrices two complete L U. detail, according number available GPUs, first divided using elimination tree. Then, submatrix alternately computed its intermediate factors from GPUs are saved host or hard disk. Finally, after finishing these assembled lower triangular upper U, respectively. suitable CPU/GPU systems, especially subset nodes without GPUs. To accommodate different hardware resources various clusters, designed run following three cases: sufficient nodes, insufficient entire cluster. test cases show that larger is, more efficient under same consumption. our numerical experiments, achieves speeds up 8× (CPU only) 2.5× GPU) six Tesla V100S Furthermore, too big be handled by SuperLU3D, still utilize cluster’s disk solve it. By reducing data exchange prevent exceeding buffer’s limit communication, enhances stability program.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis

Lower upper (LU) factorization for sparse matrices is the most important computing step for circuit simulation problems. However, parallelizing LU factorization on the graphic processing units (GPUs) turns out to be a difficult problem due to intrinsic data dependence and irregular memory access, which diminish GPU computing power. In this paper, we propose a new sparse LU solver on GPUs for ci...

متن کامل

An Unsymmetric-Pattern Multifrontal Method for Sparse LU Factorization

Sparse matrix factorization algorithms for general problems are typically characterized by irregular memory access patterns that limit their performance on parallel-vector supercomputers. For symmetric problems, methods such as the multifrontal method avoid indirect addressing in the innermost loops by using dense matrix kernels. However, no efficient LU factorization algorithm based primarily ...

متن کامل

Parallel LU Factorization on GPU Cluster

This paper describes our progress in developing software for performing parallel LU factorization of a large dense matrix on a GPU cluster. Three approaches, with increasing software complexity, are considered: (i) a naive “thunking” approach that links the existing parallel ScaLAPACK software library with cuBLAS through a software emulation layer; (ii) a more intrusive magmaBLAS implementation...

متن کامل

LU-AD1 Factorization Algorithm

متن کامل

Parallel Symbolic Factorization for Sparse LU Factorization with Static Pivoting

In this paper we consider a direct method to solve a sparse unsymmetric system of linear equations Ax = b, which is the Gaussian elimination. This elimination consists in explicitly factoring the matrix A into the product of L and U , where L is a unit lower triangular matrix, and U is an upper triangular matrix, followed by solving LUx = b one factor at a time. One of the main characteristics ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics

سال: 2023

ISSN: ['2227-7390']

DOI: https://doi.org/10.3390/math11143149